Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Idefics 3! #32473

Merged
merged 53 commits into from
Sep 25, 2024
Merged

Add Idefics 3! #32473

merged 53 commits into from
Sep 25, 2024

Conversation

andimarafioti
Copy link
Member

@andimarafioti andimarafioti commented Aug 6, 2024

What does this PR do?

Adding the Idefics 3 model.

There are still a few things to do before merging this PR. The results are not exactly the same as with our codebase and the tests are not done. We are opening it to unblock our release.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Models:

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@leloykun leloykun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just have a few comments regarding consistency with other VLMs like Chameleon + nits

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yaaay, Idefics3 is here!

I just left a few comments to make the ongoing work on unifying a bit VLMs easier for us, but didn't really review the whole PR

src/transformers/models/idefics3/configuration_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/modeling_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/modeling_idefics3.py Outdated Show resolved Hide resolved
Comment on lines 940 to 953
past_seen_tokens = 0
return_legacy_cache = False
if use_cache:
if not isinstance(past_key_values, Cache): # kept for BC (non `Cache` `past_key_values` inputs)
return_legacy_cache = True
past_key_values = DynamicCache.from_legacy_cache(past_key_values)
past_seen_tokens = past_key_values.get_seq_length()

if inputs_embeds is not None and input_ids is None and past_seen_tokens == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to support old-style cache for new models, can go directly with DynamicCache. Finally deprecated tuple cache in all decoder-only models, yay :)

And btw, past-cache-length should be obtained from cache_position, afaik we'll stop using the get_seq_length() some time in the future

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I removed the support for the old-style cache.

But I tried to use cache_position and got this error:
AttributeError: 'DynamicCache' object has no attribute 'cache_position'
So I reverted to get_seq_length()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because past_key_values can be None, it results in an exception if we load the model without specifying use_case=False. E.g.,

Idefics3ForConditionalGeneration.from_pretrained(base_model_id, use_cache=False)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take care of cache-related changes after the PR is merged, as part of work going on "new-cache compatibility"

And yes, past_kv can be None, but the logic with get_seq_length() should work as long as we check for Noneness :)

src/transformers/models/idefics3/modeling_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/processing_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/processing_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/processing_idefics3.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work adding this model!

The main comment is for the tests to be properly aligned with the new model behaviours, in particular the processor and image processor.

Some general nits for the modeling file - the main one being all classes and method which come from idefics2 should have # Copied from comments.

Before the PR can be merged, all slow tests will need to be run & pass. These should be triggered in subsequent commits (I might need to approve the workflow for them to run)

docs/source/en/model_doc/idefics3.md Outdated Show resolved Hide resolved
tests/models/idefics3/test_image_processing_idefics3.py Outdated Show resolved Hide resolved
tests/models/idefics3/test_image_processing_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/configuration_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/configuration_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/modeling_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/processing_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/processing_idefics3.py Outdated Show resolved Hide resolved
src/transformers/models/idefics3/processing_idefics3.py Outdated Show resolved Hide resolved
Comment on lines 155 to 162
if isinstance(image, Image.Image):
width, height = image.size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The images should never be Image.Image here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this here as it is a custom transforms for idefics3. If the images are passed as PIL objects, we don't convert them to numpy arrays until later in the processing pipeline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also raising a warning once if the input is not PIL images. But it will still work for numpy arrays or other types of inputs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the warning as now it works perfectly with numpy arrays :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still should be removed - we shouldn't be processing PIL images

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed!

Comment on lines 213 to 214
if isinstance(image, Image.Image):
cropped_image = image.crop((start_x, start_y, end_x, end_y))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - there shouldn't be any PIL code for the transformations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this here as it is a custom transforms for idefics3. If the images are passed as PIL objects, we don't convert them to numpy arrays until later in the processing pipeline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further discussion with Amy, I added a large change to support processing the images as numpy arrays. Here: 5a0c0f4

@andimarafioti
Copy link
Member Author

updated main

@amyeroberts
Copy link
Collaborator

@andimarafioti I can see that you re-requested review, but there's still some debugging commits being pushed so will hold of reviewing until this has been resolved. I'm going to unsubscribe to prevent getting notifications for every push - as soon as you have a question or want to let me know it's ready, just ping me with @amyeroberts and I'll get a notification :)

@andimarafioti andimarafioti force-pushed the idefics3 branch 3 times, most recently from ef576cb to 483d5d8 Compare August 16, 2024 07:36
@andimarafioti
Copy link
Member Author

@amyeroberts ready to review! There is still the multi-gpu tests that is queued but if those fail I would skip them. They run OOM in the CI on single GPU and there is an issue open for the same on idefics2 #32288. If my fix here works, I already opened a PR for that fix as well: #32840

@andimarafioti
Copy link
Member Author

Talked to @molbap and he said that there is an issue with the multi-gpu workers getting stuck in the queue. But it's not related to this PR.

@amyeroberts
Copy link
Collaborator

@andimarafioti Yes, unfortunately we're having issues at the moment with single GPU and multi-GPU runners taking a long time to run / never running cc @ydshieh.

It looks like the multi GPU tests did eventually run and there's at least one test which is currently failing to be addressed

@andimarafioti
Copy link
Member Author

I rebased on main


# Copied from transformers.models.idefics2.modeling_idefics2.Idefics2PreTrainedModel._init_weights
def _init_weights(self, module):
std = (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is, this always assigns self.config.text_config.initializer_range while, from what I understand, it should assign self.config.initializer_range in case hasattr(self.config, "initializer_range"). Is it possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right. This seems to also be a mistake on idefics2. Thanks!

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - I think we're good to go!

@andimarafioti andimarafioti merged commit f2c388e into huggingface:main Sep 25, 2024
24 checks passed
amyeroberts added a commit to amyeroberts/transformers that referenced this pull request Oct 2, 2024
* Add Idefics 3!

* fixes to make both pipelines identical

* fix for quantized models

* First pass at the review

* remove vocab size from the main config (it's still in the text_config)

* hot fix for merve

* Apply suggestions from code review

Co-authored-by: amyeroberts <[email protected]>

* re-add model_type for text_config

* remove support for old_cache

* remove hidden_size from main config

* rename idefics3 HF repo

* few changes suggested in the PR

* fix to input_data_format computation

* remove overwrite of _autoset_attn_implementation following @zucchini-nlp suggestion

* improve example

* few improvements from amy's review

* big change to enable processing input images as numpy arrays

* Changes to the code to uniformize processor kwargs

* image processing tests

* image processing tests fixes and some bugs they discovered

* addressed review comments from Yoni

* fix modeling tests

* remove special tokens that are not special

* fixes tests

* skip failing tests - they also fail for idefics2

* added paper and readded the tests with multi gpu, who knows

* Update docs/source/en/model_doc/idefics3.md

Co-authored-by: amyeroberts <[email protected]>

* Apply suggestions from code review

Co-authored-by: amyeroberts <[email protected]>

* review amy until image_processing_idefics3

* last comments from Amy

* review amy

* Update src/transformers/models/idefics3/image_processing_idefics3.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/idefics3/modeling_idefics3.py

Co-authored-by: amyeroberts <[email protected]>

* Update docs/source/en/model_doc/idefics3.md

Co-authored-by: amyeroberts <[email protected]>

* doc improvement - amy review

* fix runtime error during fine-tuning

* amy's review

* Update src/transformers/models/idefics3/image_processing_idefics3.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/idefics3/image_processing_idefics3.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/idefics3/modeling_idefics3.py

Co-authored-by: amyeroberts <[email protected]>

* ruff

* amy's comment on the order

* ruff ruff

* fix copies

* square images when they are not splitted

* ruff :(

* Update src/transformers/models/idefics3/image_processing_idefics3.py

Co-authored-by: amyeroberts <[email protected]>

* Update tests/models/idefics3/test_processing_idefics3.py

Co-authored-by: amyeroberts <[email protected]>

* fix small bug introduced in refactor

* amy's image processing changes

* fixes peft tests and ruff

* modify to_pil_image from transformers. and review from emanuele.

* add modified to_pil_image

---------

Co-authored-by: amyeroberts <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.